https://Carolinah23.github.io

Final Tutorial - Analysing data from Water Wells on Baton Rouge, LA

Diana Carolina Hurtado Pulido
December/16/2021

CMPS 3160/6160

Description project and Goals

Currently, I am working on finding subsidence rates in Baton Rouge, Louisiana during the last two decades, and investigating what factors are causing subsidence using LiDAR (Light Detection And Ranging) data from 1999 and 2018.

Subsidence is the vertical land movement caused by natural and anthropogenic factors such as sediment compaction, isostatic adjustments, fault slip, and extraction or injection of fluids. The Gulf of Mexico coastline is under constant monitoring due to high rates of Sea Level rise and subsidence, which cause rapid land loss. The study area is not in the coastal area though it is subsiding (Figure 1). This area is of particular interest because there has not been enough research to determine what factors are causing subsidence. This area has two geological faults, grow of urban development, almost 2000 water wells (with different uses) active during the study period, approximately 40 active oil and gas wells, and 11 injection wells.

So far, my results show subsidence in the whole region (Figure 1). Surprisingly subsidence increases from south to north, which is the opposite of the expected results. Areas closer to the coast and water bodies have large subsidence rates due to younger sediments compaction, but most important in this area, the faults move towards the south. Interestingly, small areas show localized subsidence and uplifting, this behavior likely is related to human activities. Then, having these results, the main goal of this project is to find out how these subsidence values are related to groundwater extraction. Important questions:

  1. The location of clusters of wells effectively occurs in areas subsiding or uplifting locally?
  2. Are deeper wells causing more or subsidence than wells at shallow depths?
  3. Is there any particular well use that may be causing more vertical changes?

Figure 1: Relative subsidence on Baton Rouge using LiDAR differencing between 1999 and 2018. Each map shows different methods applied to find elevation changes (Result of my research)

Extraction, Transform and Load (ETL)

Data description

Dataset 1 and 2 come from the Department of Natural Resources of Louisiana. The dataset 1 - Wells_df - correspond to the water wells functioning during the period of study (1999 and 2018) in the area. We collected this data (with a collaborator), last year (2020), and calculated the well depth on meters. The data is not complete, the missing values were not published when the well-owners uploaded the information, or the information is too old in some cases. Dataset 2 locations_df are the coordinates on NAD83(2011) / UTM zone 15N and other data from the water wells, just some general information will be conserved for this analysis.
Dataset 3 - varZ_df has the information of elevetation changes calculated in my research between 1999 and 2018 in a grid of 100 meters in the same coordinate system than locations_df .

For my analysis I will use the following variables in Wells_df: Well Depth (on meters), Well Use, Yield (rate of extraction - gallons per minute), and the dates of construction, last date the well was active..
For the dataset locations_df: Coordinates X and Y, LocalWellNumber (to merge the data with the first dataset), Water table depth (how depth can we find underground water - in meters), and Aquifer Name.
For dataset varZ_df : Coordinates X and Y, and average elevation change in each point.

Loading data:

I and uploading the data using the read_csv()

Tidy data

  1. Droping
  2. Raplacing incorrect or empty values by NaN
  3. Convert data to appropiate types
  4. Renaming and organizing columns

Wells_df

Wells_df has two variables that I will not use: WellDepth and SerialNumber. The first variable will not be used because I will use the information on meters, and the second one is not the main identificator of the observations and is not complete.

There are "NN" values in the LastActive_Plugged_date and DateConstructed that must be changed to NaN using np.nan

The cell below shows the variable types, to datetime the variables Last_Active_Plugged_date and DateConstructed and to number I changed the variable Well_Depth_Meters

For future models, is better to have the time that a well was active. This variable is the differente between LastActive_Plugged_date and DateConstructed variables.

And finally, I am replacing the Y and N (Yes and No) by 1 and 0 in the Active column. This variable indicates if the well was active extracting groundwater to the date that we took the data. For instance, if the well was active between 2001 and 2014 it should has a N, also, if the well is been used to monitior groundwater it should be inactive.

locations_df

locations_dfFirst, I will calculate the water table depth o meters because it is on feet. Having this variable I will drop a set of variables that are not necessary in this analysis to conserve only the variables mentioned above.

Now, I change the name of the identificator with rename to have the same column in both tables, also I reorganize the columns for better understanding.

varZ_df

Exploratory Data Analysis

For this second milestone, I am doing some analysis to see how my data distributes using different variables.

First I will merge the tables with the information about the wells with its respective localization. These dataframes have different size because locations_df has the information of all the wells that have been in the area since the data started to be collected, while Wells_df only has data for wells that were active in any period between 1999 and 2018.

Now, I want to know how is the distribution of wells depth and water table depth. My first guess is that they should follow a similar distribution, because if the wells have the purpose of extracting water, then they should be at least as depth as the water table or maybe a little more.
The data of both variables are skewed to the right, then I applied a transformation to the variables in the graph using the Ladder of Powers with an exponent of 0.3 in both cases, then if a well has a depth of 10 meters it will appear in the graph as 1.995...

The graph above shows that both variables do not have distributions as similar as I thought. Water table depth accumulates between 0 and 4 (0-102 meters depth), but the majority of wells just reach 40 meters, meaning that many do not reach the water table. Maybe the wells that do not reach the water table are used for other purposes, or there is always the possibility that one (or both) datasets were not filled initially correctly. We found that data for wells is not well stored. Definitely, there are more water wells at shallower depths where the sediment may be less compacted. The study area is on stable sediments, but changes in volume are still possible.


Now, I want to know how the wells' installation and deactivation are distributed over time, are old wells still active? when were the wells installed and how much have they been there?
The following graph shows that the construction of wells increased greatly in the '80s and decreased recently, however many of these wells are still active. Comparing the wells installed before 1980 and the wells the last active between 1980 and 1990, I can see that the rate of installation and inactivation is not even close.


The following graph shows how depth are the wells that exist (or have been active during the study period) for each use.
Public supply has the deepest wells, these wells provide water to water and sewage users in East Baton Rouge. The next ones are Observation wells and Industrial Wells. Then most of the wells in these cases are on areas where the water table is deep even if they are uncommon (first graph).


Now, I want to see how subsidence change from south to north and from west to east. This graph will help us to define what are the spatial trends that may not be easily observable in the map shown above.

The graph shows that there is more variation in elevation changes from south to north than from west to east. Just seeing the top graph, the northern area reaches variations of -0.25 m (25 centimeters), which is unusual having that the southern area should be having more negative elevations changes. With respect to the horizontal coordinate, elevation variation seems to be more stable, except for the strong bumps at the coordinate 687500, meaning that we have more variation of subsidence to the south than horizontally. This conclusion has also been studied with GPS permanent stations in the area.


In the following graph, I want to see how is the distribution of the wells in the area using the coordinates X and Y, and how depth they are. In the top graph, we see that deeper wells are located to the north, where there is more subsidence (strong negative elevation changes - graph above). Most of the wells have a depth of less than 200 meters, then, is this a hint pointing that deeper wells relate to higher values of subsidence? There are more wells in the northern area than the south, which may also indicate this.
On the other hand, the location of the wells in the X coordinate does not seem to indicate any relation with elevation changes previously seen.

Model Questions

The analysis made before brings up two important relations that will be settled using statistical modeling, 1) Relationship between calculated elevation changes and location of water wells from south-north, apparently, wells at deeper depths could cause more subsidence and 2) Public supply, Industry, and observational wells are the deepest wells, may one of these uses causing more elevation changes.

To answer these questions I will do a stadistical analysis to find the correlation between depth and elevation change in the study period. I compare these results with the distribution of uses to try to point out what use is causes more changes.

Also, I will the Yield variable to find out if the extraction rate is linked in some way to subsidence or if the depth is more important. This correlation analysis is limited because only 185 wells of 1972 in the area have this information.

Modeling

For this model I will find if there is any correlation between elevation change values calculated in my research and location from North to South, and with well depth.

Secondly, I will do a regression model where I can incorporate variables as well-use and time that the well has been working.

Note: For the sake of this project I am just using well variables to explain elevation changes, but subsidence may be caused by other factors such as fault slip, compaction of sediments, isostatic adjustments, etc... and the interaction of these.
To facilitate calculations I am assigning a subsidence value to each well using its coordinates to find the closest subsidence pixel.

1. Correlation analysis

The results below show that there is not a strong correlation between location and well-depth and elevation changes, actually these results suggest that there is a negative correlation between the X coordinate and elevation changes. From figure 6 we can see that just some wells have large depths (>100 meters), exactly 283 wells out of 1886 (15%), as most of the wells are at shallow depths and subsidence is still happening due to different factors, these few wells are not causing a strong effect in the general picture.

Finally, as an extra metric I calculate the correlation between wells' depth and water table to have a numerical description of figure 2, and seems that these data are well correlated tough they should be more similar in my opinion.

2. Regression Models

For this regression model I am using almost all variables in the dataframe to describe Elevation changes VarZ. I divided the dataset Wells_located_df in two datasets of equal size and dropped the rows with nan values. I dropped those because they cause problems when runing the model. And finally, I calculated the root mean square error to measure how far are the predictions from the real data. The error is close to 0.25 (m), considering that values of elevation change should be very small I consider that this error is pretty large. From the correlation analysis I knew that the independent variables are not great to explain this phenomena.
I tried two more models with less variables.

Model 2 This model only has variables that are explicitly characteristics of the wells

Model 3 This model only has variables that are explicitly characteristics of the study area

Model 4 This model only uses well use.

Conclusions

  1. The location of clusters of wells effectively occurs in areas subsiding or uplifting locally?

There are more wells in the northern area and there is more subsidence in the northern area, then considering the complete study region, YES. Considering local scales, sadly no possible to say from this analysis.

  1. Are deeper wells causing more or subsidence than wells at shallow depths?

There is a really weak relation between elevation changes and well depths, yet it is positive. One well may affect areas nearby and not just under it, then I will need to see if elevation change close to each well varies more than areas without wells. Definetily there is a better relation between subsidence and well features such as well-depth, well-use and active days.

  1. Is there any particular well use that may be causing more vertical changes?

Seems that there is not a strong relation between wells use and subsidence.

Challenges for future